

# Low Power and Area-Efficient 16-Bit Modified SQRT Carry Select Adder Structure

Balasubramani M<sup>1</sup>, Naveen R<sup>2</sup>, Prasanth C<sup>3</sup>

Assistant Professor, Department of ECE, Info Institute of Engineering, Coimbatore, India<sup>1, 2</sup>

UG Student, Department of ECE, Info Institute of Engineering, Coimbatore, India<sup>3</sup>

**Abstract:** Carry Select Adder (CSLA) is one of the fastest adders used in many data-processing processors to perform fast arithmetic functions. From the structure of the CSLA, it is clear that there is scope for reducing the area and power consumption in the CSLA. This paper uses a simple and efficient gate-level modification to significantly reduce the area and power of the CSLA. Based on this modification 8-b, 16-b, 32-b, and 64-b square-root CSLA (SQRT CSLA) architecture have been developed and compared with the regular SQRT CSLA architecture. The proposed design has reduced area and power as compared with the regular SQRT CSLA with only a slight increase in the delay. This work evaluates the performance of the proposed designs in terms of delay, area, power, and uses layout 0.18-m CMOS process technology. The proposed CSLA structure is better than the regular SQRT CSLA from the obtained results.

Keywords: Ripple Carry Adders, Binary to Excess-1 Converter, SQRT CSLA.

## I. INTRODUCTION

Design of area and power-efficient high-speed data path logic systems is one of the most substantial areas of research in VLSI system design. In digital adders, the speed of addition is limited by the time required to propagate a carry through the adder. The sum for each bit position in an elementary adder is generated sequentially only after the previous bit position has been summed and a carry propagated into the next position [1]. The CSLA is used in many computational systems to alleviate the problem of carry propagation delay by independently generating multiple carries and then select a carry to generate the sum. However, the CSLA is not area efficient because it uses multiple pairs of Ripple Carry Adders (RCA) to generate partial sum and carry by considering carry input Cin=0 and Cin=1, then the final sum and carry are selected by the multiplexers. The basic idea of this work is to use Binary to Excess-1 Converter (BEC) instead of RCA with Cin=1 in the regular CSLA to achieve lower area and power consumption. The main advantage of this BEC logic comes from the lesser number of logic gates than the n-bit Full Adder (FA) structure. Section 2 presents the detailed structure and the function of the BEC logic. The SQRT CSLA has been chosen for comparison with the proposed design as it has a more balanced delay, and requires lower power and area in section 3 and 4. The power and area evaluation of the regular and modified SQRT CSLA are presented in section 5, respectively. Finally, the paper is concluded in section 6.

A. Power And Area Evaluation Methodology –Basic Adder Blocks

The AND, OR, and Inverter (AOI) implementation of an XOR gate is shown in Fig.1. The gates between the dotted lines are performing the Operations in parallel and the numeric representation of each gate indicates the delay contributed by that gate. The delay and area evaluation methodology considers all gates to be made up of AND,



Fig.1 Delay and Area evaluation of XOR gate

OR, and Inverter, each having delay equal to 1 unit and area equal to 1 unit. We then add up the number of gates in the longest path of a logic block that contributes to the maximum delay. The area evaluation is done by counting the total number of AOI gates required for each logic block [1]. Based on this approach, the CSLA adder blocks of 2:1 multiplexer, Half Adder (HA), and FA are evaluated and listed in Table 1

TABLE I DELAY AND AREA COUNT OF THE BASIC BLOCKS OF CSLA

| DLOCKS OF CSLA  |       |      |
|-----------------|-------|------|
| Adder Blocks    | Delay | Area |
| XOR             | 3     | 5    |
| 2:1 MULTIPLEXER | 3     | 4    |
| HALF ADDER      | 3     | 6    |
| FULL ADDER      | 6     | 13   |

### **II. SQRT CSLA- DESIGN IMPLEMENTATION**

A. Ripple-Carry Adder

It is possible to create a logical circuit using multiple full adders to add N-bit numbers. Each full adder inputs a Cin, which is the Cout of the previous adder.



This kind of adder is called a ripple-carry adder, since Multiplexer is also called data selector, is a combinational each carry bit "ripples" to the next full adder. The first full adder may be replaced by a half adder. The layout of a ripple-carry adder is simple, which allows for fast design time. However the ripple-carry adder is relatively slow, since each full adder must wait for the carry bit to be calculated from the previous full adder. The gate delay can easily by inspection of the full adder circuit. Each full adder requires three levels of logic. In a 32-bit ripple-carry adder, there are 32 full adders, so the critical path delay is 2(from input to carry in first adder) +31\*3(for carry propagation in later adders) =95 gate delays [3]. Some other multi-bit adder architectures break the adder into blocks. It is possible to vary the length of these blocks based on the propagation delay of the circuits to optimize computation time. These block based adders include the carry-skip adder which will determine P and G values for each block rather than each bit, and the carry select adder which pre-generates the sum and carry values for either possible carry input(0 or1)to the block, using multiplexers to select the appropriate result when the carry bit is known. In the ripple carry adder, the output is known after the carry generated by the previous stage is produced [3]. Thus, the sum of the most significant bit is only available after the carry signal has rippled through the adder from the least significant stage to the most significant stage. As a result, the final sum and carry bits will be valid after a considerable delay.

TABLE II POWER AND AREA ANALYSIS FOR RIPPI E-CARRY ADDER

| RCA(bit) | Power(mW) | Area |
|----------|-----------|------|
| RCA(2)   | 0.73      | 108  |
| RCA(3)   | 0.92      | 162  |
| RCA(4)   | 1.12      | 216  |
| RCA(5)   | 1.31      | 270  |

From the above table 2 we can infer that low power and area is consumed by Ripple Carry Adder.

## B. Multiplexer



Fig.2 Schematic for Ripple-Carry Adder

logic circuit that select one of 2<sup>n</sup> inputs, n select lines that identify which input will be provided to the output, and only one output. Large multiplexer can be implemented using two 4\*1 MUXs one 2\*1 multiplexer. These are mainly used to increase the amount of data that can be sending over the network within a certain amount of time and bandwidth. An electronic multiplexer makes it possible for several signals to share one device or resource, for example one A/D converter or one communication line, instead of having one device per input signals.



Fig.3 Schematic diagram for Multiplexer

| TABLE III POWER AND AREA ANALYSIS FOR |
|---------------------------------------|
| MULTIPLEXER                           |

| Group   | Power(nW) | Area |
|---------|-----------|------|
| Group 2 | 13.66     | 36   |
| Group 3 | 18.21     | 48   |
| Group 4 | 22.72     | 60   |
| Group 5 | 27.82     | 72   |

From the above table 3 we can infer that low power is consumed and area is occupied by multiplexer

#### C. Multiplexer

Code conversions are very essential in digital systems. Design of area and power efficient high speed data path logic systems are one of the most substantial areas of research in VLSI system design. In digital adders the speed of addition is limited by the time required to propagate a carry through the propagated into the next position [1]. The CSLA is used in many computational systems to elevate the problem of carry propagation delay. However the CSLA is not area efficient because it uses multiple pairs of RCA (rip le carry adder) to generate partial sum and carry by considering carry input (Cin=0,



Cin=1), then final sum and carry are selected by multiplexers [2]. The power and area of CSA can be reduced by using BEC-1 converter instead of RCA.

In order to achieve efficient low power VLSI circuits we are replacing a BEC instead of RCA. To replace the n-bit RCA, a (n+1) bit BEC is required. A combinational circuit of adder with multiplexer, binary to excess-1 code converter and ripple carry adder is called a Hybrid adder. Here the binary to excess-1 converter has complex layout using CMOS logic in terms of area, delay and power consumption. Hence an attempt has been made to develop a converter for low power consumption and less complexity. A 4-b BEC are shown in Fig.4.



Fig.4 Block diagram for4-bit BEC

Fig.4 illustrates how the basic function of the CSLA is obtained by using the 4-bit BEC together with the multiplexer. One input of the 8:4 multiplexer gets as it input (B3, B2, B1, and B0) and another input of the multiplexer is the BEC output. This produces the two possible partial results in parallel and the multiplexer is used to select either the BEC output or the direct inputs according to the control signal Cin.



Fig.5 4-b BEC with 8:4 Multiplexer

The importance of the BEC logic stems from the large silicon area reduction when the CSLA with large numbers of bits are designed [3]. The Boolean expressions of the 4-bit BEC is listed as (note the functional symbols NOT, &XOR).

X0=~B0, X1=B0^B1, X2=B2^(B0&B1), X3=B3^(B0&B1&B2).

| BEC(bit) | Power(nW) | Area |
|----------|-----------|------|
| BEC(3)   | 10.13     | 22   |
| BEC(4)   | 18.42     | 42   |
| BEC(5)   | 26.72     | 62   |
| BEC(6)   | 35.06     | 82   |

From the above table 4 we can infer that that low power and area is consumed by Binary to Excess-1 converter.



Fig.6 Schematic for BEC

### III. POWER AND AREA EVALUATION METHODOLOGY OF REGULAR16-B SQRT CSLA

The structure of the 16-b regular SQRT CSLA has five groups of different size RCA. The delay and area evaluation of each group, in which the numerals within specify the delay values, e.g., sum2 requires 10 gate delays [4].

The steps leading to the evaluation are as follows:

1. The group2 has two sets of 2-b RCA. Based on the consideration of delay values of Table I, the arrival time of selection input c1 time(t) of 6:3 multiplexer is earlier than s3[t=8] and later than s2[t=6]. Thus, sum3 [t=11] is summation of s3 and multiplexer [t=3] and sum2[t=1] is summation of c1 and multiplexer.

2. Except for group2, the arrival time of multiplexer selection input is always greater than the arrival time of data outputs from the RCA's. Thus, the delay of group3 to group5 is determined, respectively as follows:

 $\{c6, sum [6:4]\} = c3 [t=10] + multiplexer$ 

 $\{c10, sum [10:7]\} = c6 [t=13] + multiplexer$ 

 $\{\text{cont, sum [15:11]}\}= c10 [t=16] + multiplexer$ 

3. The one set of 2-b RCA in group 2 has 2 FA for Cin=1and the other set has 1 FA and 1 HA for Cin=0. Based on the area count of Table1, the total number of gate counts in group2 is determined as follows.

Gate count=57(FA+HA +Multiplexer)

FA=39(3\*13) HA=6(1\*6)

FA=12(3\*1)

## IV.POWER AND AREA EVALUATION METHODOLOGY OF MODIFIED16-B SQRT CSLA

The structure of the proposed 16-b SQRT CSLA using BEC for RCA with cin=0 to optimize the area and power. We again split the structure into five groups. The delay and area estimation of each group [4].



The steps leading to the evaluation are given here:

1. The group2 has one 2-b RCA which has 1 FA and 1 HA for Cin=0. Instead of another 2-b RCA with Cin=1 a 3-b BEC is used which adds one to the output from 2-b RCA. Based on the consideration of delay values, the arrival time of selection input c1[time(t)]=7 6:3 multiplexer is earlier than the s3[t=9] and c3[t=10] and later than the s2[t=4]. Thus, the sum3 and final c3 (output from FA=13(1\*13) multiplexer) are depending on s3 and multiplexer and HA=6(1\*6) partial c3 (input to multiplexer) and multiplexer, AND=1 respectively. The sum2 depends on c1 and multiplexer [5]. NOT=1 XOR=10(2\*5) Mux=12(3\*1)



Fig.7 Simulation output for regular CSLA Input A = 10110010 Input B = 01000001Output = 11110011.

2. For the remaining group's the arrival time of multiplexer selection input is always greater than the arrival time of data inputs from the BEC's. Thus, the delay of the remaining groups depends on the arrival time of multiplexer selection input and the delay.

3. The area count of group2 is determined as follows: Gate count =43(FA+HA +Multiplexer +BEC)



Fig.8 Simulation output for modified CSLA Input A = 10110010 Input B = 01000001Output = 11110011



Fig.9 Block diagram of 16-bit regular SQRT CSLA structure.





Fig.10 Block diagram of 16-bit modified SQRT CSLA structure

## V. COMPARISON OF REGULAR AND MODIFIED **16-B SQRT CSLA WITH POWER AND AREA**

TABLE V COMPARISONS BETWEEN POWER AND AREA

| ADDER           | AREA(um <sup>2</sup> ) | AVERAGE POWER |
|-----------------|------------------------|---------------|
| Regular<br>CSA  | 1147                   | 0.8063mW      |
| Modified<br>CSA | 805                    | 0.0561mW      |

## TABLE IV COMPARISON OF VARIOUS PARAMETERS

| Parameters            | REGULA<br>R 16-BIT | PROPOSE<br>D 16-BIT |
|-----------------------|--------------------|---------------------|
| MOSFET                | 1836               | 1288                |
| MOSFET geometrics     | 8                  | 8                   |
| Voltage Source        | 35                 | 34                  |
| Independent nodes     | 998                | 705                 |
| Boundary nodes        | 36                 | 35                  |
| Sub-circuit instances | 361                | 251                 |
| Total nodes           | 1034               | 740                 |

Table .5 shows low area and low average power (0.0561mW) in modified CSA compared with regular CSA.

## **VI.CONCLUSION**

A simple approach is proposed in this paper to reduce the Chennai in 2014. He has 6 years of teaching experience at area and power of SQRT CSLA architecture. The reduced college level. Currently he is working as Assistant number of gates of this work offers a great advantage in Professor ECE at Info Institute of Engineering, the reduction of area and also the total power. The Coimbatore. He has published 13 papers in International comparison shows that the modified SORT CSLA has a slightly larger delay (only 3.76%), but the area and power of the 64-b modified SQRT CSLA are significantly conducted both in India and abroad. His area of interest reduced by 17.4% and 15.4% respectively. The power- is Bio signal Processing, Soft computing, VLSI Design delay product and area-delay product of the proposed and Communication Engineering. He is life member of design is reduced when compared to the existing system.

Area and power in the proposed architecture is best suited for VLSI hardware implementation. It would be interesting to test the design of the modified 128-b SQRT CSLA.

#### REFERENCES

- O. J. Bedrij, "Carry-select adder," IRE Trans. Electron. Comput., [1] pp.340-344, 1962.
- B. Ramkumar, H.M. Kittur, and P. M. Kannan, "ASIC [2] implementation of modified faster carry save adder," Eur. J. Sci. Res., vol. 42, no. 1, pp.53-58, 2010.
- [3] T. Y. Ceiang and M. J. Hsiao, "Carry-select adder using single ripple carry adder," Electron. Lett., vol. 34, no. 22, pp. 2101-2103, Oct. 1998.
- Y. Kim and L.-S. Kim, "64-bit carry-select adder with reduced [4] area," Electron.Lett., vol. 37, no. 10, pp. 614-615, May 2001.
- J. M. Rabaey, Digtal Integrated Circuits-A Design Perspective. [5] Upper Saddle River, NJ: Prentice-Hall, 2001.
- [6] Y. He, C. H. Chang, and J. Gu, "An area efficient 64-bit square root carry-select adder for lowpower applications," in Proc. IEEE Int. Symp. Circuits Syst., 2005, vol. 4, pp. 4082-4085.
- [7] Cadence, "Encounter user guide," Version 6.2.4, March 2008.

#### **BIOGRAPHIES**



Dr. M. Balasuramani received his B.E (ECE) degree from BIT, Sathy 2008.He obtained his M.E (VLSI Design) degree from Bannari Amman Institute of Technology, Sathy in 2010. He was awarded Ph.D. in I &C Engineering from Anna university

and National Journals and also published around seven papers in International and National Conferences IETE, ISDR and ISTE.





**Mr. R. Naveen** received his B.E (EEE) degree from CSI College of Engineering; Ooty 2005.He obtained his M.E (VLSI Design) degree from KSR, Tiruchengode in 2007. He is pursuing his doctoral programme as a research scholar

in Low power VLSI Design under guidance of Dr. K. Thanushkodi at Department of ECE, Akshaya College of Engineering & Technology, Coimbatore. He has 9 years of teaching experience at college level. Currently he is working as Assistant Professor ECE at Info Institute of Engineering, Coimbatore. He has published 10 papers in International and National Journals and also published around 6 papers in International and National Conferences conducted both in India and abroad. His area of interest is signal Processing, VLSI Design and Communication Engineering. He is life member of IETE, ISDR and ISTE.